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A simplified form of the time dependent evolutionary dynamics of a quasispecies model with 
a rugged fitness landscape is solved via a mapping onto a random flux model whose asymptotic 
behavior can be described in terms of a random walk. The statistics of the number of changes 
of the dominant genotype from a finite set of genotypes are exactly obtained confirming existing 
conjectures based on numerics. 
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In evolution, long periods of stasis or inactivity are 
unctuated by bursts of rapid activity. Fossil records 
1| reveal this basic pattern in the evolution of biological 
species and the same behavior is observed in the devel- 
opment of microbial populations 0] and artificial life 
Not surprisingly, the dynamics of genetic algorithms 
also exhibits this punctuated behavior. In this paper we 
will show how a simple model of biological evolution can 
be exactly solved using a mapping onto a random flux 
model. The important asymptotic details of this random 
flux model can then be determined in terms of the first 
passage time distribution of a random walk. 

The model we study was introduced in § as a sim- 
plified version of the quasispecies model which is used 
for the study of large populations of replicating macro- 
molecules [g. In 0, the quasispecies model was studied 
in the strong selection limit where the location in the 
space of genotypes is defined as the genotype having the 
largest population. A shell model |5| may be derived in 
the strong selection limit and a further simplification of 
this model leads to the i.i.d. (independent and identi- 
cally distributed) shell model where the natural space of 
genotypes, which is that of binary sequences, is replaced 
by a one dimensional lattice. Rather than re-derive the 
model we shall describe it and the reader will immedi- 
ately see that it can be reinterpreted in terms of a simple 
evolutionary process. 

We consider an ensemble of A different genotypes la- 
beled by i = 1, 2, . . . N. The fitness of a genotype is given 
by its effective rate of reproduction per individual Vi > 
and thus the size of the population at time t is given by 
rii(t) — n^O) exp(i^i). In terms of logarithmic variables, 
yi(t) — ln(rii(<)) = ln(rii(0)) + Vit. One can interpret 
Ui(t) as the trajectory of a particle moving ballistically 
with a non- negative velocity Vi, starting from its initial 
position ?/i(0). The i.i.d. version of the shell model ||, 
which we will call the leader model, is defined as follows: 
we draw A velocities {vi}i<i<N independently from the 
same probability distribution p(v) (which has positive 
support only). We then consider the semi- infinite lines 
of slope Vi describing the evolution of genotype i (up to 



an overall constant) 



Ui(t) = -i + Vi t. 



(1) 



At any time t > 0, the leader is defined as the genotype 
i having the maximum y.i(t), the corresponding i is thus 
the most populated genotype at time t. The choice of 
Ui(0) = —i comes from the details of the original qua- 
sispecies model 0. Thus, the evolution of the trajec- 
tories is completely deterministic, the only randomness 
comes from the velocities. Obviously at t = 0, y\ is the 
leader; however if v\ is not the maximal velocity, then y\ 
will ultimately be overtaken by a faster/fitter genotype. 
At each of these overtaking events the number of geno- 
types which have been leaders increases by one, finally 
the fastest genotype will become the final leader and no 
more leader changes will occur. In the general context of 
evolutionary processes these over takings correspond to 
punctuation events. 

The total number of lead changes is denoted by In and 
we denote by Wk the velocity of the leading genotype af- 
ter the fc-th lead change. Clearly In is a random vari- 
able, varying from one realization of velocities to another. 
Based on simulations, it was observed that for large 
A, (In) » j3 In AT. where, remarkably, the coefficient (3 is 
rather robust and depends only on the tails of the distri- 
bution p(v). Based on numerics, Krug and Karl 5] made 
some conjectures about the value of (3 and also showed 
how a comparison with record statistics gives the upper 
bound (3 < 1. Similar logarithmic growth of the aver- 
age number of lead changes has also been reported Q 
recently in the context of growing networks where the 
leader is the maximally connected node. 

In this letter, we present an exact solution to this prob- 
lem, confirming the conjectures of 5]. Moreover, we cal- 
culate the variance of In and show that ((In — (In)) 2 ) ~ 
7 In A for large A, where the coefficient 7 is calculated 
exactly and shown to be as robust as (3. We also show 
that the full distribution of / n around its mean is asymp- 
totically Gaussian. The key observation that leads to 
the exact solution of this model is a mapping onto a 
random flux model whose late time properties are iden- 
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FIG. 1: Model trajectory for N = 4. Genome 4 becomes 
the ultimate leader and there is only one leadership change 
indicated by the black dot. 



tical to those of the original model. Here, the veloc- 
ity distribution is chosen as before but instead of fixing 
the initial positions yi{0) of the genotype i at — i, we 
chose it to be a random variable uniformly distributed on 
[0, —N]. From a coarse grained point of view, for a large 
number of genotypes, this difference in the initial condi- 
tion is not expected to change the asymptotic properties. 
In the context of the quasispecies model, this random 
initial condition translates to having the initial popula- 
tion of each genotype having a probability distribution: 
Prob(rii(0) = x) = (xN)~ x , with exp(-iV) < x < 1. An 
example set of trajectories for N = 4 and where In = 2 is 
shown in Fig. (1). If w k is the velocity of the fc-th leader 
then clearly only genotypes with velocities greater than 
Wk can become subsequent leaders. From the rest frame 
of the leader, in the next time interval At the genotype 
i, with velocity x>i (> w k ), will overtake the leader if it is 
at a distance Ax = (uj — Wk)At behind the leader. The 
rate at which the genotype i becomes the new leader is 
thus given by 

n = (Vi - w k )(S (jfc(O) - j/i(0) - (vi - Wk)t)), (2) 

where the angled brackets indicate the average over the 
initial conditions. Given that j/fc(0) > yi(0) the initial 
distance di k — y k (0) — ?/i(0), between the genotypes i 
and k, is a random variable also uniformly distributed 
over [0, N + y k (0)] and consequently the average of the 
delta function in the above expressions is equal to one and 
independent of time. The probability that the genotype 
i (with Vi > w k ) becomes the next leader is given by 
r il Tli-i r j which we write as a transition probability 



Pk^i 



(vi - w k )0(v l - Wk) 



^N , 



w k )9(vj 



(3) 



This rather intuitive rule appears in a simple traffic 
model studied in although the physics is different 



to that here because on catching up with a slower car 
the faster one then adopts the same speed. We next 
show that this model can be mapped onto a first-passage 
problem for a random process. Notice that, once the k- 
th leader is selected with velocity w k , the number Nk of 
possible future leaders is 



Nk 
N 



JV 



(Vj - w k ) 



N 



(4) 



where TV is the total number of genotypes. In the limit 
of large N, one can replace the right hand side of Eq. Q 
by the integral over v, 



Nk 
N 



p(v) dv = P(w k ) 



(5) 



which is exact up to 0(1 JvN) corrections and where 
P(v) — J max p(u)du is the cumulative velocity distribu- 
tion. Clearly, the number of lead changes In is the value 
of k where Nk = 1. This gives, P(wi N ) = l/N and hence 

-ln[P(^J]=lniV. (6) 

We define = — ln[P(u>fc)] whose evolution is given by 

Y k+ i=Y k +ik, (7) 



where clearly 



\n[P{wk+i)/P{wk)}. 



(8) 



Thus Yk can be interpreted as the position of a random 
walker at time k and its time evolution is given by the 
Langevin equation (JJJ where £j. is the noise at step k. 
This redefinition is not yet very useful since the noise 
at step k depends on Yk+i and Yk- However, as we will 
see, for large k the probability distribution of the noise 
£/c becomes independent of k and w k and has a finite 



mean 



(6 



H and variance ([£& - (£fc)]" 



a , that 

can be computed explicitly for arbitrary velocity distri- 
bution p(v). For large k, Eq. (JJJ represents a discrete 
time random walk with a positive drift fj,, i.e., 



Yfc+i = Y k + (j, + (jf] k 



(9) 



where rjk is a noise with zero mean (rjk) — and unit vari- 
ance. We will also see that rjk's are not only completely 
independent of Wk for large fc, they are also uncorrelated 
at different times. Thus Eq. 10 is a true Markovian evo- 
lution of a discrete time random walker with a positive 
drift fj,. Obviously then, by central limit theorem, Yk will 
have a Gaussian distribution with mean (Y k ) = [ik and 
variance (Y k 2 ) — (Y k ) = a 2 k. 

Once we have the Markovian random walker evolution 
as in Eq. it follows from Eq. © that the number of 
lead changes In is just the first time the process Y k (start- 
ing at some initial value Yq) hits the level Y — ln(iV). 
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Thus the distribution of In is simply the distribution 
of the first-passage time to the level Y = ln(iV). To 
compute this, it is convenient to define Zk = IniV — Y k - 
Then Z^s evolve via, Zk+i = Z^ — fi — crrjk starting from 
Zq = In AT — Yq. Thus Zk is the position of a random 
walker at step k with a negative drift — /j, towards the 
origin and In now represents the first-passage time to 
the origin starting from the initial position Zq. Now, for 
large k, the discrete-time random walker can be replaced 
by a continuous-time Brownian motion, 



dZ_ 

dt 



-fj, + ar)(t) 



(10) 



where 77 is a white noise with (rj(t)) = and (rj(t)rj(t')) = 
5(t — t'). For such a process, the distribution P(tf\Z ) of 
the first-passage time tj to the origin is known exactly |9j 
and we can apply it here to obtain the probability that 
In = k is given by 



Q(k) 



In TV 



exp 



2a 2 k 



(k-(lnN)/(xy 



(11) 



Note that this distribution of In is non-Gaussian. How- 
ever, we expect this result to be valid only in the vicinity 
of k ps In iV//i, i.e., near its mean. This can be traced 
back to the fact that in deriving this result we replaced a 
discrete-time random walk by a continuous-time Brown- 
ian process. Near its mean, using k « IniV in Eq. 
the distribution of In becomes a Gaussian 



Q(k) 



3/2 



<jV2tt IniV 



exp 



/< 



2<t 2 IniV 



(k-(\nN)/^y 



with mean and variance (for large N) given by 



(12) 



1 



(l N ) = (3\nN; where = - (13) 

2 

((In -(In)) 2 ) = 7 IniV; where 7 = ^. (14) 

M 

Thus, irrespective of the velocity distribution p(v), the 
distribution of In near its mean is is a universal Gaus- 
sian characterized by two parameters /1 and a. The only 
dependence on p(v) appears through the two constants 
[a and cr. 

To calculate the mean p, and the variance a 2 of the 
noise defined in Eq. |H1 we note that for a given Wk , 
£k is a random variable since Wk+i is a random variable 
drawn from the distribution in Eq. We define 



J(v) 
K(v) 
L(v) 



P(u) du (15) 
[P'(u)/P(u)]J(u)du (16) 
[P'(u)/P(u)]K(u)du. (17) 



Using the definition in Eq. JSJ and the transition proba- 
bility in Eq. the mean of (for a given Wk) is 

= f w ^[HP(v)) - HP(wk))}(v - w k )p(v)dv 

(18) 

Using integration by parts, in both the numerator and 
denominator above we find 



(4) = 1 



K(w k ) 
J(wk) ' 



(19) 



where the function K(v) is defined in Eq. 1|16|) . The sec- 
ond moment is given by 



(e k ) 



/*-~pn(P(«)) - \n(P(w k ))] 2 (v - w k )p{v) dv 



fy°*(v-w k )p(v)dv 



and a similar calculation leads to 

L(w k ) 



<(a-<&)) 2 ) = l + 2 



J(w k ) 



K(w k ) 



J(w k ) 



(20) 



(21) 



where the functions J, K and L are defined in Eqs. 
((TBjl and (fT7|) respectively. 

We now consider the three classes of distributions con- 
sidered by 

(i) Fast decaying distribution with w max = +00: In 
this case, it is easy to see that for large u, 



P'(u) _ J'(u) 
P(u) ~ J(u) 



(22) 



Thus, using this result in the definition of K(v) in 
Eq. i|16|) one finds that for large Wk 



K(w k ) 



P'(u) 



Similarly, for large Wk, 

P'(u) 



J(u)duK -J(w k ) (23) 



L(w k ) 



P(u) 



K(u) du « J(wk) 



(24) 



Using these results in Eqs. (|19|) and l|21|l we find for large 
k 



((^-(&)) 2 > 



(25) 
(26) 



Thus, as stated earlier, we see the variance become inde- 
pendent of k and Wk- 

(ii) Distribution with a finite u m ax, with p(v) ~ 
I ln(w m ax — w)| 7 (w m ax — v) a : In this case, for u close to 
Wmax, we find 



P'{u) _ / 1 + cA J>) 
P(u) ~ \2 + a) J(u) 



(27) 
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and it follows that for Wk close to v 
K(w k ) 1 ' 

L(w k ) 



1 



2 + a 



J(Wk) 



J(w k ) 



Using these results in Eqs. (|T§)) and lj2T|) we get 

2a + 3 



(6) 

<(6-<6» 2 ) 



a + 2 
2a 2 + 6a - 
(a + 2f 



(28) 

(29) 
(30) 



(Hi) Power-law decaying distribution with w max = 

+oo, and p(v) ~ \n(v) 7 v~ a with a > 2: In this case, 
for large u 



P'(u) 
P[u) 



a 



1\ J>) 

J[u) 



(31) 



Using this result in the definition of K(v) and L(v) one 
easily finds that for large w k 



K(w k ) 
L(w k ) 



1 



f 



J(w k ) 



J(wk) 



Using these results in Eqs. (|I9|I and l|21(l we get 

2a- 3 



(6c) =n = 

((&-(6» 2 > =° 2 = 



a-2 
2a 2 - 6a 



(a 



2f 



(32) 

(33) 
(34) 



that for all these veloc- 

,2 



One can also demonstrate |1C_, 
ity distributions, and for large k and k' (£,kCk) ~ P 2 ~~ * 0, 
indicating that the noise £/c's become completely uncorre- 
cted in time. Thus Eq. truly represents a Markovian 
random walk with drift /i. Knowing the exact values of 
fi and <7, we then find that distribution of In, near its 
mean, is given by the Gaussian in Eq. I|12|l with mean 
and variance given by Eqs. (|14|l . The coefficients (3 and 
7 are thus calculated exactly knowing /i and a and are 
given, for each of the cases mentioned above, by 



(«) 
(Hi) 



1/2 ; 7 = 

a + 2 
2a + 3 ' 

a-2 
2a -3 ' 



1/4 



7 = 



7 



(a + 2)(2a 2 + 6a + 5) 

(2a + 3) 3 
(a-2)(2a 2 -6a 



(35) 
(36) 



5) 



(2a - 3) 3 



(37) 



The results for the coefficient 8 are in complete agree- 
ment with those conjectured in |5j in all three cases and 
we have further verified all our results by simulating the 



original i.i.d. shell model with an algorithm which 
mits us to simulate up to N — 10 200 genotypes 
Moreover, we have also calculated the variance exactly 
and shown that near its mean, the distribution of In 
is a universal Gaussian. In Q it was pointed ou that 
the variance of In is typically smaller than the mean 
indicating the temporal correlation between leadership 
changes, this is clearly seen in our exact results. Away 
from its mean, one expects to see departures of the distri- 
bution of In away from the Gaussian form. To compute 
the full distribution one needs to solve the first-passage 
problem for the discrete-time process without resorting 
to the continuous-time approximation. Fortunately, for 
our discrete-time process, this can be achieved by observ- 
ing that the the evolution of Yfc with k, though random, is 
actually a strictly monotonic process. This follows from 
Eq. (JSJ that shows that the noise is always positive. 
The distribution of the first-passage time / n to the level 
ln(iV) then satisfies the identity [lfj 



Prob(/Ar <k)= Prob(Yfe > ln(iV)). 



(38) 



This gives Q{k) = Prob^ = k) = Prob(/jv < k + 
1) - Prob(l N < k) = Prob(y fc+ i > In(JV)) - Prob(Y fe > 
ln(iV)). Thus, a knowledge of the distribution of 
(which is usually much simpler to compute) provides us 
with an exact distribution of lead changes Q(k) for all 
k. For example, for an exponential velocity distribution 
p(v) = e~ v , the probability density function of Y% can be 
found explicitly for all k 



Pk{y) 



2fe-l 



[2k- 1) 



exp(-y). 



(39) 



This result is in fact asymptotically valid for any rapidly 
decaying distribution p(v) ^(|- Using this result we thus 
obtain the full probability distribution of In for the ex- 
ponential velocity distribution as 



Q(k) = 



(MAO) 
N(2k)\ 



21, 



1 



ln(7V) 



2k + 1 



(40) 



In Fig. (2) we show the predictions of Eq. l{4T)ll versus 
the results of extensive simulations and the agreement is 
perfect. The above use of the monotonicity of also 
enables one to obtain analytical results, away from the 
Gaussian regime, for generic fitness distributions [lo|. 

To summarize we have solved exactly the asymptotic 
statistics of lead changes in a quasispecies evolution 
model by mapping the model to a random flux model. 
Our results confirm previous conjectures about the mean 
number of leader changes. We have also computed the 
variance exactly and shown that the distribution is gener- 
ically Gaussian in the region around the mean. Finally, 
we remark that the evolution time r defined as the time 
when the last leader change occurs can be shown to have 
a distribution q[r) ~ r~ 2 for large r |l0| . as found in 
more realistic models 0- 
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FIG. 2: Plot of the distribution Q(k) of and In (circles), for 
N = 10 20 generated from 2.10 s samples with velocities taken 
from an exponential distribution. Also shown is the result 
Eq. 1140 \ (solid line), the result Eq. HI IB (dotted lines) and 
the Gaussian result Eq. Ill 21 (dashed line). 
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